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^1 | Abstract 

| A square is the concatenation of a nonempty word with itself. A word 

CZ2 . has period p if its letters at distance p match. The exponent of a nonempty 

word is the quotient of its length over its smallest period. 

In this article we give a proof of the fact that there exists an infinite bi- 
y—( • nary word which contains finitely many squares and simultaneously avoids 

^ 1 words of exponent larger than 7/3. 

CO ■ Our infinite word contains 12 squares, which is the smallest possible 

O^l ■ number of squares to get the property, and 2 factors of exponent 7/3. 

These are the only factors of exponent larger than 2. 

The value 7/3 introduces what we call the finite-repetition threshold 
£ — ■ of the binary alphabet. We conjecture it is 7/4 for the ternary alphabet, 

like its repetitive threshold. 
O^l ■ Keywords: combinatorics on words, repetitions, word morphisms. 

MSC: 68R15 Combinatorics on words. 

>'■ 
• i-H ■ 

^ ; 1 Introduction 



Repetitions in words is a basic question in Theoretical Informatics, certainly 
because it is related to many applications although it has first been studied 
by Thue at the beginning of the twentieth century with a pure theoretical 
objective. Related results apply to the design of efficient string pattern matching 
algorithm, to text compression methods and entropy analysis, as well as to the 
study of repetitions in biological molecular sequences among others. 

The knowledge of the strongest constraints an infinite word can tolerate 
help for the design and analysis of efficient algorithms. The optimal bound on 
the maximal exponent of factors of the word has been studied by Thue and 
many other authors after him. One of the first discoveries was that an infinite 
binary word can avoid factors with an exponent larger than 2, called 2 + -powers. 
This has been extended by Dejean [3] to the ternary alphabet and her famous 
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conjecture on the repetitive threshold for larger alphabets has eventually been 
proved recently after a series of partial results by different authors (see [9l [2] 
and references therein). 

Another constraint is considered by Fraenkel and Simpson 4 : their pa- 
rameter to the complexity of binary infinite words is the number of squares 
occurring in them without any restriction on the number of occurrences. It is 
fairly straightforward to check that no infinite binary word can contain less than 
three squares and they proved that some of them contain exactly three. Two of 
these squares appear in the cubes 000 and 111 so that the maximum exponent 
is 3 in their word. In this article we produce an infinite word with few distinct 
squares and a smaller maximal exponent. 

Fraenkel and Simpson's proof uses a pair of morphisms, one to get an infinite 
word by iteration, the other to produce the final translation on the binary 
alphabet. Their result has been proved with different pairs of morphisms by 
Rampersad et al. [5] (the first morphism is uniform), by Harju and Nowotka [5 
(the second morphism accepts any infinite square-free word), and by Badkobeh 
and Crochemore pQ (the simplest morphisms). 

In this article we show that we can combine the two types of constraints for 
the binary alphabet: producing an infinite word whose maximal exponent of its 
factor is the smallest possible while containing the smallest number of squares. 
The maximal exponent is 7/3 and the number of squares is 12 to which can be 
added two words of exponent 7/3. 

It is known from Karhumaki and Shallit [6 that if an infinite binary word 
avoids 7/3-powers it contains an infinite number of squares. Proving that it 
contains more than 12 squares is indeed a matter of simple computation. 

Shallit |10j has built an infinite binary word avoiding 7/3 + -powers and all 
squares of period at least 7. His word contains 18 squares. 

Our infinite binary word avoids the same powers but contains only 12 squares, 
the largest having period 8. As before the proof relies on a pair of morphisms 
satisfying suitable properties. Both morphisms are almost uniform (up to one 
unit). The first morphism is weakly square-free on a 6-letter alphabet, and the 
second does not even correspond to a uniquely-decipherable code but admits 
a unique decoding on the words produced by the first. To get the morphisms, 
we first examined carefully the structure of long words satisfying the conditions 
and obtained by backtracking computation. Then, we inferred the morphisms 
from the regularities found in the words. 

After introducing the definitions and main results in the next section, we 
provide a weakly square-free morphism and the infinite square-free word on 6 
letters it generates in Section[3] Section [4] shows how this word is translated into 
an infinite binary word satisfying the constraints. In the conclusion we define 
the new notion of finite-repetition threshold and state a conjecture on its value 
for the 3- letter alphabet. 
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2 Repetitions in binary words 



A word is a sequence of letters drawn from a finite alphabet. We consider 
the binary alphabet B = {0, 1}, the ternary alphabet A3 = {a, b, c}, and the 
6-letter alphabet Aq = {a, b, c, d, e, f }. 

A square is a word of the form uu where u is a nonempty (finite) word. 
A word has period p if its letters at distance p are equal. The exponent of a 
nonempty word is the quotient of its length over its smallest period. Thus, a 
square is any word with an even integer exponent. 

In this article we consider infinite binary words in which a small number of 
squares occur. 

The maximal length of a binary word containing less than three square is 
finite. It can be checked that it is 18, e.g. 010011000111001101 contains only 
00 and 11. But, as recalled above, this length is infinite if 3 squares are allowed 
to appear in the word. A simple proof of it relies on two morphisms / and ho 
defined as follows. The morphism / is defined from A^ to itself by 

/(a) = abc, 
/(b) = ac, 
/(c) =b. 

It is known that the infinite word f = /(a) 00 it generates is square-free (see 
Chapter 2]). The morphism ho is from A^ to B* and defined by 

h(a) = 01001110001101, 
h(b) = 0011, 
h(c) = 000111. 

Then the result is a consequence of the next statement. 

Theorem 1 ([!]) The infinite word ho = /io(/(a)°°) contains the 3 squares 
00, 1 1 and 1010 only. The cubes 000 and 111 are the only factors occurring in 
h and of exponent larger than 2. 

It is impossible to avoid 2 + -powers and keep a bounded number of squares. 
As proved by Karhumaki and Shallit [6], the exponent has to go up to 7/3 to 
allow the property. 

In the two following sections we define two morphisms and derive the prop- 
erties that we need to prove the next statement. 

Theorem 2 There exists an infinite binary word whose factors have an expo- 
nent at most 7/3 and that contains 12 squares, the fewest possible. 

Our infinite binary word contain the 12 squares 2 , l 2 , (01) 2 , (10) 2 , (001) 2 , 
(010) 2 , (Oil) 2 , (100) 2 , (101) 2 , (110) 2 , (01101001) 2 , (10010110) 2 , and the two 
words 0110110 and 1001001 of exponent 7/3. 

Proving that it is impossible to have less than 12 squares in the previous 
statement results from the next table. It has been obtained by pruned back- 
tracking sequential computation that avoids exhaustive search. It shows the 
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maximal length of binary words whose factors have an exponent at most 7/3, 
for each number s of squares, < s < 11. 





= 012 


3 4 


5 


6 


7 


8 9 


10 


11 


i(s) 


= 358 


12 14 


18 


24 


30 


37 43 


83 


116 



3 A weakly square-free morphism on six letters 

In this section we consider a specific morphism used for the proof of Theorem^ 
It is called g and defined from to itself by: 



We prove below that the morphism is weakly square-free in the sense that 
g = .g°°(a) is an infinite square- free word, that is, all its finite factors have 
an exponent smaller than 2. Note that however it is not square-free since for 
example <?(cf) = eabdfabdf contains the square (abdf) 2 . This prevents from 
using characterisation of square-freeness of the morphism, or equivalently of the 
fixed points of the morphism. As far as we know only an ad hoc proof is possible. 

The set of codewords g(a)'s (a G A§) is a prefix code and therefore a 
uniquely-decipherable code. Note also that any occurrence of abac in g(w), 
for w € Aq, uniquely corresponds to an occurrence of a in w. The proof below 
relies on the fact that not all doublets and triplets (words of length 2 and 3 
respectively) occur in g, as the next statements show. 

Lemma 1 The set of doublets occurring in g is 



Proof. Note that all letters of A§ appear in g. Then doublets ab, ac, ba, bd, 
ce, df , ea, f b appear in g because they appear in the images of one letter. The 
images of these doublets generate two more doublets, cb and da, whose images 



Lemma 2 

The set of triplets in g is 

T — {aba, abd, acb, ace, bab, bac, bda, bdf , cba, cea, dab, df b, eab, f ba}. 

Proof. Triplets appear in the images of a letter or of a doublet. Triplets found 
in images of one letter are: aba, abd, ace, bab, bac, bdf, eab, f ba. The images 
of doublets occurring in g, in set D of LemmaU contain the extra triplets: acb, 
bda, cba, cea, dab, dfb. ■ 



f ff(a) 
5(b) 

( 5(c) 
5(d) 
5(e) 

I 5(f) 



abac, 
babd, 
eabdf , 
f bace, 
bace, 
abdf. 



D = {ab, ac, ba, bd, cb, ce, da, df , ea, 



fb}. 



do not create new doublets. 
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Table 1: Gaps of abac: words between consecutive occurrences of abac in g. 
They are images of gaps between consecutive occurrences of a. 



5(b) 


= babd 


4 


fl(cb) 


= eabdfbabd 


9 


<?(bd) 


= babdfbace 


9 


ff(ce) 


= eabdfbace 


9 


fl(bdfb) 


= babdf baceabdf babd 
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To prove that the infinite word g is square-free we first show that it contains 
no square with less than four occurrences of the word g(a) — abac. Then, we 
show it contains no square with at least four occurrences of it. The word abac 
is chosen because its occurrences in g correspond to g(a) only, so they are used 
to synchronise the parsing of the word according to the codewords g(a)'s. 

Lemma 3 No square in g can contain less than four occurrences of abac. 

Proof. Assume by contradiction that a square ww in g contains less than four 
occurrences of abac. Let x be the shortest word whose image by g contains ww. 

Then x is a factor of g that belongs to the set a.{{A§ \ {a})* a) 5 . Since two 
consecutive occurrences of a in g are separated by a string of length at most 4 
(the largest such string is indeed bdf b as a consequence of Lemma , the set 
is finite. 

The square-freeness of all these factors has been checked via an elementary 
implementation of the test, which proves the result. ■ 

Proposition 1 No square in g can contain at least four occurrences of abac. 

Proof. The proof is by contradiction: let k be the maximal integer for which 
g fe (a) is square- free and let ww be a square occurring in <? fc+1 (a) and containing 
at least 4 occurrences of abac. Distinguishing several cases according to the 
words between consecutive occurrences of abac (see Table [T]) , we deduce that 
g k (a.) is not square- free, the contradiction. 
The square ww can be written 

«o(abac • • • abac)ui wi(abac • • • abac)u2 



where vq, m, V\, and U2 contain no occurrence of abac. It occurs in the image 
of a factor of g. The central part of w starting and ending with abac is the 
image of a unique word U factor of g k (a.) due to the code property: 

g(U) = Vq 1 wu^ 1 = r\ 1 wu~ 2 1 ■ 

We split the proof in two parts according to whether abac occurs in u\V\ or 
not. 
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No abac in uivi. We consider five cases according to the value of UiV\, the 
gap of abac (see Tabic [TJ. 

1. u\Vi = babd corresponds to g(b) only. If either u% or v\ is empty, then vq 
or W2 is g(b), in either case we get bllbll or [/b[/b that are squares. Else 
«o has a suffix d so it belongs to 5(b), and again bUbU is a square in g. 

2. u\V\ = eabdf babd corresponds to g{cb) only. An occurrence of cb always 
belongs to .g(ab) therefore U has a prefix abd and a suffix aba, and the 
letter after aba is c. If v\ is empty, ui has a prefix eabdf babd so it is 
g(cb) and again UcbUcb is a square. If v\ is not empty then vq has a 
suffix d, suffix of ,g(b), therefore bllcbU c is a square. 

3. u\Vi — babdfbace corresponds to g(bd). The word abda is a factor of 
g(ba) only so U has a prefix aba and a suffix ba. If \ui\ = 0, vq — 
babdfbace can only be g(bd) so bdUbdU is a square. Otherwise ui must 
have a prefix b; since U has a suffix ba the next letter after it is either b 
or c; as only gib) is prefixed by b the letter is b so U2 has a prefix or is a 
prefix of gib), and we know that bab is always followed by d thus UbdUbd 
is a square. 

4. u\V\ — eabdfbace corresponds to g(ce) only. If u\ is empty, vq is <?(ce) 
so ceUceU is a square. Otherwise, 1x2 has a prefix or is a prefix of 17(c); 
the next letter after g{c) is either b or e; (see Lemma [T]); if it is b the 
right-most U has a suffix aba but the left-most U has a suffix f ba, which 
cannot be. Therefore the letter after c is e and UceU ce is a square. 

5. U\V\ = babdf baceabdf babd. If \v\\ > 12, vq has a suffix g(dfb) and the 
letter before it is b, so bdfbC/bdfbC/ is a square. If < |i>i| < 12, then 
Mi I > 5, so U2 has a prefix or is a prefix of g(bd) so the next letter is either 
a or f . If it is a the right- most U has a suffix ba but vq is a suffix of or 
has a suffix g(b); the letter before it is either 17(c) or 5(f); if it is c then U 
has a prefix abd and bdfbabd is from the concatenation of g{c) and gib) 
or g(dfb); in either case the left occurrence of U will have ea as a suffix, 
a contradiction since fb[/bdf bUbd and Ubdf bf/bdfb are both squares. 

An occurrence of abac in U\V\. Then the suffix of u\ is either aba, ab or a 
while the respective prefix of v\ is c, ac or bac. 

Note that c is followed either by b or e (Lemma[I} and that cb occurs only in 
the image of ab. Then if the occurrence of abac is followed by b, the occurrence 
of cb in vo is preceded by aba, and then there is a square starting I, 2 or 3 
positions before the occurrence of ww, which brings us back to the first case. 
Therefore, abac is followed by e. 

The occurrence of abace comes from g(ac), and by Lemma[2]uii>i contains 
an occurrence of g{bac). So, the occurrence of abace is preceded by d, and 
since da occurs only in the image of ba, the occurrence of da in ui is followed 
by bac, which yields a square starting I, 2 or 3 positions after the occurrence 
of ww. Again this takes us back to the first case. 
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In all cases we deduce the existence of a square in g fc (a), which is a contra- 
diction with the definition of k. Therefore there is no square in g containing at 
least four occurrences of abac. ■ 



The next corollary is a direct consequence of Lemma [3] and Proposition [TJ 

Corollary 1 The infinite word g is square-free, or equivalently, the morphism 
g is weakly square-free. 

4 Binary translation 

The second part of the proof of Theorem [2] consists in showing that the special 
infinite square-free word on 6 letters introduced in the previous section can be 
transformed into the desired binary word. This is done with a second morphism 
h from A* Q to B* defined by 



f Ma) 


= 10011, 


h(h) 


= 01100, 


h(c) 


= 01001, 


h(d) 


= 10110, 


h(e) 


= 0110, 


U(f) 


= 1001. 



Note that the codewords of h do not form a prefix code, nor a suffix code, nor 
even a uniquely-decipherable code! We have for example <?(ae) = 10011-0110 = 
1001 ■ 10110 = g(f d). However, parsing the word h(y) when y is a factor of g 
is unique due to the absence of some doublets and triplets in it (see Lemmas [1] 
and [2]). For example fd does not occur, which induces the unique parsing of 
100110110 as 10011 • 0110. 

Proposition 2 The infinite word h = h(g°°(a.)) contains no factor of expo- 
nent larger than 7/3. It contains the 12 squares 2 , l 2 , (01) 2 , (10) 2 , (001) 2 , 
(010) 2 , (Oil) 2 , (100) 2 , (101) 2 , (110) 2 , (01101001) 2 , (10010110) 2 only. Words 
0110110 and 1001001 are the only factors with an exponent larger than 2. 

The proof is based on the fact that occurrences of 10011 in h identify oc- 
currences of a in g and on the unique parsing mentioned above. It proceeds by 
considering several cases according to the gaps between consecutive occurrences 
of 10011 (see Table [2|), associated with gaps between consecutive occurrences 
of a in g, which leads to analyse paths in the graph of Figure [1] 

Proof. We show that if h would contain a square not in the list it would come 
from a square in g, which cannot be since g is square- free (Corollary [1]). 

Suppose h contains the square w 2 . It is a factor of h(g k (&)), for some integer 
k and can be written vo(h(&) ■ ■ ■ h(a.))ui Vi(h(a.) ■ ■ ■ h{a))u2- The central part of 
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Figure 1: Graph showing immediate successors of gaps in the word g: a suffix 
of it following an occurrence of a is the label of an infinite path. 

Table 2: Gaps between consecutive occurrences of 10011 in h. 



h(b) 


= 01100 


5 


h{cb) 


= 0100101100 


10 


h(bd) 


= 0110010110 


10 


h(ce) 


= 010010110 


9 


/i(bdfb) 


= 0110010110100101100 


19 



w is the image of a unique square- free factor U of <? fc (a) due to the unique 
parsing mentioned above: 

h(U) — (h(a) ■ ■ ■ h(a.j) — v Q ~ 1 wu^ 1 — v± mu^ . 

We proceed through different cases as in the proof of Proposition [1] 

No /i(a) in u\V\. 

1. u\V\ = 01100 corresponds to h(b) only. 

If \vi\ > 1, then vq belongs to h(b), bllbll is a square. Else |ui| > 4 so 
U2 belongs to h(b), it cannot belong to h(e) since ae is not a factor of g, 
therefore UbUb is a square of g. 

2. mvi = 0110010110 corresponds to h(bd). 

v (h(a) ■ ■ ■ h{a)) h(bd) (h(a) ■ ■ ■ h(a)) u 2 

" v ' V v ' 

the word abda is a factor of g(ba) only, so U has a prefix abac and a suffix 
ba (Note that U cannot be aba since ababdaba is not a factor of g) . 

vq (/i(abac) • • • h(ba)) h(bd) (/i(abac) • • • h(ba)) U2 
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If U2 comes from or has a prefix h(b) then the letter after bab is always 
d so we have the square UbdUbd. Then U2 is a prefix of or has a prefix 
h(c), the longest common prefix (LCP) of h(c) and h(b) is 01, so vo has 
a suffix 10010110, which is a suffix of h(bd) or h(ce). If vq comes from 
h(bd) then we have the square bdUbdll. So vo is a suffix of h(ce) 

h(ce) (/i(abac) • • • /i(ba)) ft,(bd) (h(abac) • • • ft(ba)) /i(c). 



da baX 



bda 




cea X 



a X 



bdfba... 



The sign XX shows that the particular branch of the trie terminates be- 
cause either a square occurs or the sequence is not a factor of g. The sign 
X on the other hand represents the termination of a particular branch as 
a consequence of the discontinuation of the corresponding branch in the 
other trie. If we continue these tries we will have: 

ce abac babd f bace abdf babd abac eabdf bace abac babd abac eabdf . . .ba 



bd abac babd f bace abdf babd abac eabdf bace abac babd abac eabdf . . .ba c 



which is the image of 

eabdf bace ^abac . . . abac babd f bace abac . . . abac e 

itself image of 

cea^abda^ac 

so we have the same situation as at the starting point; but U is shorter in 
this case, therefore if we continue this process we should have 

ce abac babd f bace abdf babd abac babd f bace abdf bace a 

but abdf bace is the image of f e that is not in D (Lemma [T]). 
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3. mvi = 0100101100 corresponds to h(cb). 

The word acba is a factor of g(ab) only, so U has a prefix abd and a suffix 
aba: 

vq (/i(abd) . . . h(aba)) h(cb) (/i(abd) . . . /i(aba)) «2 

v. / s y / 

The word 112 comes from or has a prefix h(c). If the letter after it is b, we 
have the square UcbUcb. 

Otherwise 112 comes from or has a prefix h(ce). If vq comes from or has a 
suffix h(b) then we have the square bUcbllc. 

Therefore the letter before U is e preceded by c, i.e. the string before the 
left U is ce: 

h(ce) (/i(abd) . . . h(ab&)) h(cb) (h(abd) . . . h(aba)) h(ce). 
v ' " v ' 

^Jjda... 

\}d& X ^^.cba^^ 

cea bdfb^^ ^^ba^^ bdfbaX 

ce^ ceaX 
bdfbaXX 

^^bda... 

^bda XX ^/ Cba ^\ 

cba bdfb^^^ ^^^ba^^ bdfbaXX 

ce^ ceaXX 
bdfbaX 

Now we have the same situation as in the previous case 

%(ce)) (%(abac)) . . . %(ba))) %(bd)) (%(abac)) . . . %(ba))) h(g(c)). 

4. U\Vi = 010010110 corresponds to h(ce) only. 

Before c is always ba (Lemma [2]) and after e is ab (Lemma [2]), so ab is a 
prefix of U and ba is a suffix of U: 

v (h(ab) . . . h(ba)) h(ce) (h(ab) . . . h{ba)) u 2 . 



(i): U2 belongs to h(cb) since we cannot have UceU ce and the letter after 
c is b or e (Lemma [T]): 

v (h(ab) . . . h(ba)) h(ce) (h(ab) . . . h(ba)) h(cb) 
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The letter before bacb is a so: 

v (h(ab) . . . ft(aba)) ft(ce) (ft(ab) . . . ft(aba)) ft(cb). 

S v ' S v ' 

NOTE: U is not aba since abaceabacb is not a factor of g (a). 

Now abace is a prefix of the image of ac so U has a prefix abdf and the 
word before it is either ce or b; the first choice gives the square ceUceU 
and the second choice: 

h(b) (ft(abdf ) . . . ft(aba)) ft(ce) (ft (abdf ) . . . ft(aba)) ft(cb). 






Ma XX J=b 
l>a— l>dfl>I< ^^ hiK \ bdTbaXX 

ce^ ceaXX 
IbdfbaX 



bda X Jib 
coa- dflui >a^ bdfbaX 

ceaX 

DaXX 

Now if we continue the above tries we get: 
b abd f bace abac babd abac eabdf babd abac babd f bace abdf ba . . . ba 

V v ' 

ce abdf bace abac babd abac eabdf babd abac babd f bace abdf ba . . .ba cb 

s v ' 

which is the image of 

bd abacbabdfbace abdf . . .bace abacbabdfbace abdf . . .bab. 

S v ' V v ' 

This is the same situation as the next case and we will see that after going 
one step back it brings us back to this case again. Now we are exactly in 
the same situation as at the beginning except that the length of the word 
X = abdf ... a is shorter than U. Repeating this process enough times we 
should see that the word 

babd f bace abac babd abac eabdf bace abac babd aba 
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which is the image of bdabaceaba, is not a factor of <? fc (a). 

(ii): u 2 belongs to h(b) (the LCP of h(c) and h(b) is 01) so v must have 
a suffix 0010110, which belongs to h(bd) because if it belongs to h(ce) 
then ceUceU is a square. 

ft(bd) (/i(ab) . . . ft(ba)) ft(ce) (/i(ab) . . . fc(ba)) /i(b). 



bda- 




^Jpda, baX 



cca- 




Continuing this trie we have 

bd abac babdf bace a. . .bace abac babdf bace a . . .babd. 

^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

This is factor of g (b abdf ...a ce abdf ... a cb) which is the previous case. 

5. uivi = 0110010110100101100 corresponds to ft(bdfb) only. This case is 
dealt with the same method. 

u 0(a) • • • h(&)) /i(bdfb) (/i(a) . . . ft(a)) u 2 . 

S v ' S v ' 

If U2 belongs to h(c), the LCP of h(c) and h(b) is 01 so uo must have a suf- 
fix 10010110100101100, therefore u belongs to h(bd£ b). But bdf b[/bdf b[/ 
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is a square and a factor of g k (a); a contradiction, so U2 belongs to or has 
a prefix h(b). We have two choices here. 

(i): the next word after the right occurrence of U is ba. The LCP of h(bd) 
and /i(ba) is 10, uq has suffix of 110100101100, so it either belongs to 
/i(dfb) or /i(acb). The first case gives that dbfC/bdbfC/b is a square and 
a factor of g k (a), a contradiction. So uo belongs to /i(acb): 

ft(acb) (ft(abda) . . . /i(a)) ft(bdf b) (ft(abda) . . . ft(a)) /i(ba). 

S v ' S v ' 

Prefixes and suffixes of U are determined only by looking at D and T. 

cba XX 

cba bda- 



bdfba bda- 




We have: 

abac babd abac eabdf bace . . . abac babd f bace 

V v ' 

abdf babd abac eabdf bace . . . abac babd f bace abac 

S v ' 

which is the image of 

ab ace . . . a bdf b ace . . . a bda. 

Now this is the next case so if we go back enough steps we should see that 
the length of U decreases and at the end we get 

ac babd abac eabdf babd abac eaba 

but this is not a factor of g k (a), a contradiction. 

(ii): the word after U is bd. Now here the only possible letter after abd is a 
since if it is f it is a prefix of f b so we have E/bdf bC/bdf b, a contradiction. 
As the LCP of ft(bdfb) and h(bda) is 01100101101001 u must have a 
suffix 01100 so it can belong to h(ab) or /i(acb). 
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(I) : 

/i(ab) 0(a) . . . h(a)) /i(bdfb) (h(a) . . . h(a)) fe(bda). 
v » v . ' 

Only using D, T and the Figure Q] we can continue building U, 

h(ab) (ft(ace) . . . h(ba)) ft(bdfb) (/i(acea) . . . h(ba)) h(bd&), 

" v ' " v ' 

Continuing further we get: 

/i(abac eabdf babd ^abac . . . abac babd f bace abdf babd abac . . . abac babda). 

This is the image of 

(acb a^^ bdf b a^^ ba) ) 

and we are back to the case above. 

(II) : 

/i(acb) (/i(a) . . . ft(a)) /i(bdfb) (/i(a) . . . /i(a)) /i(bda). 
Using the same method we build the word U: 

ac b abd .. . ba bd f b ace .. . ba bd a. 

Here we cannot go further as U cannot have abd nor ace as prefixes at 
the same time. 

An occurrence of h(a.) in uiv%. Looking at Figure [l] the image of the 
concatenation of two connected nodes (distance 1 arrow) are the possibilities 
for ui«i/i(a), but note that the second period of the square must start within 
h(a), starting point of the arrow, otherwise it is one of the cases above. If the 
lengths of both nodes are larger than 2 then by unique parsing we are bound to 
have a square in g k (a) and get a contradiction. So we have to consider only the 
four cases where one of the nodes is ba: 

1. uivi = ft(bacb) = 01100100110100101100, so u 2 must have a prefix h(b) 
and uq a suffix of h(cb), before cb is always a, so acbllbacbUb is a square 
in g k (a). 

2. mvi = ft(bace) = 0110010011010010110, so u 2 must have a prefix h(b) 
and uo a suffix /i(ce), before ce is always a, so aceUbaceUb is a square in 
9 k {a). 

3. U\V\ = h(ceab) = 0100101101001101100, so u 2 must have a prefix of 
h(ce) and uo a suffix of h(b), after ce is always a, so bllceabllcea is a 
square in g k (a). 
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4. mvi = fr(bdab) = 01100101101001101100, so using tries as before shows 
that after enough backward iteration we should have 

f bace abdf babd abac babd abac eabdf babd abac babd 

which contains a square. 

In all cases the conclusion is that we get a square in g k (sL), a contradiction 
with the definition of k. This completes the proof of Proposition^ ■ 

Theorem [2] follows immediately from Proposition [2] 

5 Conclusion 

The constraint on the number of squares imposed on binary words slightly differs 
from the constraint considered by Shallit [10 . The squares occurring in his word 
have period smaller than 7. Our word contains less squares but their maximal 
period is 8. 

Looking at repetitions in words on larger alphabets, the subject introduces a 
new type of threshold, that we call the finite-repetitions threshold (FRt). For the 
alphabet of a letters, FRt (a) is defined as the smallest rational number for which 
there exists an infinite word avoiding FRt(a) + -powers and containing a finite 
number of r-powers, where r is Dejean's repetitive threshold. Karhumaki and 
Shallit results as well as ours show that FRt(2) — 7/3. Our result additionally 
proves that the associated minimal number of squares is 12. 

Computation shows that the maximal length of (7/4) + -free ternary word 
with only one 7/4-repetition is 102. This leads us state the following conjecture, 
which has been tested up to length 20000. 

Conjecture 1 The finite- repetitions threshold of 3-letter alphabet is | and the 
associated number of ^-powers is 2. 

Values for larger alphabets remain to be explored. 
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